Trees and Nets


Kerry Back

BUSI 520, Fall 2022
JGSB, Rice University

Trees

Decision tree

  • Split sample sucessively into smaller subsamples by answering “yes-no” questions.
  • Each question is based on a single variable: is it above a threshold?
  • Split a specified number of times (depth). Final subsamples are called leaves.
  • For regression problems, the prediction for each observation is the mean of the training observations that end up in the same leaf.
  • For classification problems the prediction is the category that occurs most frequently in the leaf.

Example of a tree

Decision tree splitting for regression

  • At each step, the algorithm chooses a variable and threshold to split on.
  • In a regression problem, the standard decision criterion is SSE.
    • Error = actual - mean of subset
    • Outliers tend to get separated into their own subsets (less so for SAE)
  • There are also other relatively standard criteria, for example in scikit-learn.

Decision tree splitting for classification

  • Let \(k\) = number of classes. Suppose a split results in \(n\) observations in a subset. Let \(x_i=\) fraction in class \(i\).
  • Default criterion for scikit-learn is Gini = \(1 - \sum_{i=1}^k x_i^2\)
  • If we consider \(\sum x_i^2\) subject to \(\sum x_i=n\), the value is maximized at \(x_i=1/n\) and minimized at boundaries: some \(x_i=1\) and others equal 0.
  • So, maximizing Gini means trying to create “pure” subsets (all of same type).
  • There are also other standard criteria, for example entropy and log loss.

Random forest

  • Multiple trees fit to random data
  • Data for each tree is a bootstrapped sample:
    • random selection of rows (with replacement)
    • same size as original sample
  • Prediction is average of predictions of the trees
  • Hyperparameters = number of trees and depth of trees

Boosting

  • Boosting means combining models, for example trees.
  • Prediction is sum of individual predictions.
  • Gradient boosting fits second model to errors of first, third to errors of first two combined, …
  • Adaptive boosting fits original target but overweights errors from prior model
  • Learning rate (weight to put on new model) is a hyperparameter

Neural networks

Multi-layer perceptrons

  • A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
  • A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
  • The inputs in the first layer are the predictors.
  • The inputs in successive layers are the calculations from the prior level.
  • The last layer is a single neuron that produces the output.

Illustration

  • input is \(x \in \mathbb{R}^4\)
  • functions \(f_1, \ldots, f_5\) of \(x\) are calculated (called “hidden layer”)
  • output is \(g(f_1(x), \ldots, f_5(x))\)

Rectified linear units

  • The usual function for the neurons (except in the last layer) is \[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\] Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
  • This function is called a rectified linear unit (RLU).
  • There are other choices. This function in general is called the “activation function.”

Analogy to neurons firing


If \(w_i>0\) and \(b<0\), then \(y>0\) only when \(x_i\) are large enough.


A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).

Neural net output function

For regression problems, the output function is linear

\[ y = b+w_1x_1 + \cdots + w_nx_n\].

For classification, there is a linear function for each class and the output is the set of probabilities \(e^{y_i}/\sum e^{y_j}\).

Deep learning

  • Deep learning means a neural network with many layers. It is behind facial recognition, self-driving cars, …
  • Need specialized library, probably TensorFlow (from Google) or PyTorch (from Facebook), and probably need graphical processing units (GPUs) – i.e., run on video cards
  • Hands-On Machine Learning with Scikit-Learn and TensorFlow
  • Deep Learning for Coders with fastai and PyTorch (also fastai website)